### IOWA STATE UNIVERSITY Digital Repository

Retrospective Theses and Dissertations

Iowa State University Capstones, Theses and Dissertations

1997

## CMOS circuits for high speed serial data communication

Xiaoyu Xi Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/rtd

Part of the <u>Data Storage Systems Commons</u>, <u>Digital Circuits Commons</u>, <u>Digital</u>

<u>Communications and Networking Commons</u>, and the <u>Systems and Communications Commons</u>

#### Recommended Citation

Xi, Xiaoyu, "CMOS circuits for high speed serial data communication" (1997). Retrospective Theses and Dissertations. 16796. https://lib.dr.iastate.edu/rtd/16796

This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact digirep@iastate.edu.



#### CMOS circuits for high speed serial data communication

by

#### Xiaoyu Xi

A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of  ${\tt MASTER\ OF\ SCIENCE}$ 

Major: Electrical Engineering
Major Professor: William C. Black

Iowa State University
Ames, Iowa
1997

Copyright © Xiaoyu Xi, 1997, All rights reserved.

### Graduate College Iowa State University

This is to certify that the Master's thesis of  $Xiaoyu\ Xi$  has met the thesis requirements of Iowa State University

Signatures have been redacted for privacy

### TABLE OF CONTENTS

| AC | CKNOWLEDGEMENTS                                        | ix |
|----|--------------------------------------------------------|----|
| ΑB | STRACT                                                 | x  |
| 1  | GENERAL INTRODUCTION                                   | 1  |
|    | 1.1 Introduction                                       | 1  |
|    | 1.2 Thesis Organization                                | 2  |
|    | PART I A HIGH PERFORMANCE                              |    |
|    | PARALLEL-TO-SERIAL CONVERRTER                          | 3  |
| 2  | INTRODUCTION                                           | 4  |
| 3  | LITERATURE REVIEW                                      | 5  |
|    | 3.1 Schematic for Parallel-to-Serial Converter         | 5  |
|    | 3.2 High Speed D Flip-Flops                            | 8  |
| 4  | A NOVEL PARALLEL-TO-SERIAL CONVERTER                   | 9  |
|    | 4.1 Converter Core – a $5 \times 4$ D Flip-Flop Matrix | 9  |
|    | 4.2 Clocking Strategy                                  | 12 |
|    | 4.3 Implementation of the Complete Architecture        | 14 |
|    | 4.4 A High-Speed Dynamic D Flip-Flop                   | 14 |
|    | 4.5 The Driver Stage                                   | 16 |
|    | 4.6 SPICE Simulations                                  | 17 |
|    | 4.7 Layout Considerations                              | 20 |
| 5  | SUMMARY AND CONCLUSION                                 | 23 |

#### PART II A CHARGE-PUMP CLOCK RECOVERY CIRCUIT FOR NRZ DATA REGENERATION 24 6 **INTRODUCTION** 25 6.1 Introduction 25 6.2 Definition of Terms 25 7 LITERATURE REVIEW 27 8 **BASIC THEROIES** 28 8.1 Topology and Wording Scheme 28 8.2 Loop Dynamic in Locked State 29 8.3 Charge-Pump PLL 30 8.4 Clock Recovery from NRZ Data 33 9 DESIGN OF A CHARGE-PUMP CLOCK RECOVERY CIRCUIT 34 9.1 Voltage Controlled Oscillator(VCO) 34 9.2 Phase Detector 39 9.3 Charge Pump and Loop Filter 42 9.4 Measurement of $K_{PD}$ , Loop Bandwidth and Damping Factor 45 9.5 Startup Circuit 47 9.6 Closed Loop Simulation 48 10 SUMMARY AND CONCLUSION 61 GENERAL CONCLUSION AND FUTURE WORKS 11 62 11.1 Conclusion 62 11.2 Future Works 62

REFERENCES

64

#### LIST OF TABLES

| Table 4.1  | Transistor size of the D flip-flop                                        | 16 |
|------------|---------------------------------------------------------------------------|----|
| Table 4.2  | Simulated performance of the D flip-flop                                  | 16 |
| Table 4.3  | Simulated performance of the proposed converter                           | 20 |
| Table 9.1  | Transistor sizes of the feedback opamp                                    | 36 |
| Table 9.2  | Transistor sizes of the VCO gain stage                                    | 37 |
| Table 9.3  | Voltage-frequency transfer function of the VCO at 25 $^{\circ}\mathrm{C}$ | 38 |
| Table 9.4  | Phase detector working states                                             | 39 |
| Table 9.5  | Transistor sizes of DFF1 and DFF2                                         | 41 |
| Table 9.6  | Timing characteristics of the DFF                                         | 41 |
| Table 9.7  | Transistor sizes of the charge pump circuit                               | 44 |
| Table 9.8  | Measured output current at different phase error                          | 46 |
| Table 10.1 | Measured performance of the designed CRC                                  | 61 |

#### LIST OF FIGURES

| Figure 1.1  | Topology of transceiver                                | 2  |
|-------------|--------------------------------------------------------|----|
| Figure 3.1  | Parallel-to-serial converter (shift scheme)            | 5  |
| Figure 3.2  | Timing relationship (shift scheme)                     | 5  |
| Figure 3.3  | Parallel-to-serial converter (selection scheme)        | 6  |
| Figure 3.4  | Timing diagram (selection scheme)                      | 7  |
| Figure 3.5  | TSPC DFF                                               | 8  |
| Figure 4.1  | Proposed parallel-to-serial converter                  | 10 |
| Figure 4.2  | Clocking strategy for proposed converter               | 11 |
| Figure 4.3  | 5 stage VCO and 10-phase clocks                        | 12 |
| Figure 4.4  | Generation of 5-phase pulses                           | 12 |
| Figure 4.5  | Generation of LDCLK and LDCLKbar                       | 13 |
| Figure 4.6  | Block diagram of the converter                         | 14 |
| Figure 4.7  | A ratioed TSPC DFF                                     | 15 |
| Figure 4.8  | A pseudo random data generator                         | 15 |
| Figure 4.9  | Driver stage with ENABLE control                       | 17 |
| Figure 4.10 | Block diagram of the complete converter                | 17 |
| Figure 4.11 | Top level schematic of the transmitter                 | 18 |
| Figure 4.12 | Schematic of the complete parallel-to-serial converter | 18 |
| Figure 4.13 | Timing of clocks measured from simulation              | 19 |
| Figure 4.14 | Serial output waveform                                 | 19 |
| Figure 4.15 | Clock path for serial shifting cells                   | 20 |
| Figure 4.16 | Model of gate RC delay                                 | 21 |
| Figure 4.17 | Layout of one big inverter                             | 21 |
| Figure 4.18 | Layout of the converter including the driver stage     | 22 |
| Figure 4.19 | Layout of the complete transmitter chip                | 22 |

| Figure 6.1  | NRZ data                                                   | 26 |
|-------------|------------------------------------------------------------|----|
| Figure 8.1  | Basic phase-locked loop                                    | 28 |
| Figure 8.2  | Linear model of a PLL                                      | 29 |
| Figure 8.3  | Charge pump PLL                                            | 31 |
| Figure 8.4  | Loop filter with zero and ripple suppression               | 32 |
| Figure 9.1  | A simple Topology of ring oscillator                       |    |
| J           |                                                            | 34 |
| Figure 9.2  | Designed ring oscillator with self-adjusted 50% duty-cycle | 35 |
| Figure 9.3  | Frequency response of the feedback opamp                   | 36 |
| Figure 9.4  | Gain stage of the ring oscillator                          | 36 |
| Figure 9.5  | Transfer function of the designed ring oscillator          | 38 |
| Figure 9.6  | Clock waveform at 25 °C                                    | 38 |
| Figure 9.7  | Clock waveform at 60 °C                                    | 39 |
| Figure 9.8  | Phase detector used in the CRC                             | 40 |
| Figure 9.9  | Signal waveforms of the phase detector while in lock       | 41 |
| Figure 9.10 | D flip-flop used in the phase detector                     | 41 |
| Figure 9.11 | XOR gate used in the phase detector                        | 42 |
| Figure 9.12 | A previously reported charge pump circuit                  | 43 |
| Figure 9.13 | Modified charge pump with two NMOS inputs                  | 43 |
| Figure 9.14 | Loop filter used in the phase detector                     | 45 |
| Figure 9.15 | $\Delta \phi$ -I transfer function of the phase detector   | 46 |
| Figure 9.16 | Designed startup circuit to precharge the loop filter      | 47 |
| Figure 9.17 | Closed loop schematic                                      | 48 |
| Figure 9.18 | CRC with startup circuit                                   | 49 |
| Figure 9.19 | DFT of the recovered clock                                 | 49 |
| Figure 9.20 | Waveform of the recovered clock                            | 50 |
| Figure 9.21 | Transient current of the power supply                      | 50 |
| Figure 9.22 | Average power dissipation of CRC                           | 50 |
| Figure 9.23 | Regenerated data at 1.1Gbps, 25 °C                         | 51 |
| Figure 9.24 | Regenerated data at 1.1Gbps, 60 °C                         | 51 |
| Figure 9.25 | Configuration 1 for testing the impulse capture range      | 52 |
| Figure 9.26 | Step control voltage at 25 °C                              | 53 |
|             |                                                            |    |

| Figure 9.27 | DFT of the recovered clocks at 25 $^{\circ}\mathrm{C}$       | 53 |
|-------------|--------------------------------------------------------------|----|
| Figure 9.28 | Step control voltage at 60 °C                                | 54 |
| Figure 9.29 | DFT of the recovered clocks at 60 $^{\circ}\mathrm{C}$       | 54 |
| Figure 9.30 | Configuration 2 for testing the impulse capture range        | 55 |
| Figure 9.31 | Control voltage to the VCO at 25 $^{\circ}\mathrm{C}$        | 55 |
| Figure 9.32 | DFT of the recovered clocks at 25 $^{\circ}\mathrm{C}$       | 55 |
| Figure 9.33 | Control voltage to the VCO at 60 $^{\circ}\mathrm{C}$        | 56 |
| Figure 9.34 | DFT of the recovered clocks at 60 $^{\circ}\mathrm{C}$       | 56 |
| Figure 9.35 | Control voltage at 25 °C                                     | 57 |
| Figure 9.36 | DFT of the clocks at the highest and lowest Frequency        | 57 |
| Figure 9.37 | Time piece at the lowest frequency, 25 $^{\circ}\mathrm{C}$  | 58 |
| Figure 9.38 | Time piece at the highest frequency, 25 $^{\circ}\mathrm{C}$ | 58 |
| Figure 9.39 | Control voltage at 60 °C                                     | 59 |
| Figure 9.40 | DFT of the clocks at the highest and lowest frequency        | 59 |
| Figure 9.41 | Time piece at the lowest frequency, 60 $^{\circ}\mathrm{C}$  | 60 |
| Figure 9.42 | Time piece at the highest frequency, 60 °C                   | 60 |

#### **ACKNOWLEDGEMENTS**

I would like to express my sincerest appreciation to my major advisor, Dr. William Black, who gave me the opportunity to join our department and led me to the world of IC design. I cannot be more grateful for all the guidance, encouragement and thoughtfulness he offered throughout my research and study.

I would also like to thank Dr. Randall Geiger, Dr. Edward Lee and Dr. Marwan Hassoun who were always there to help me and gave me valuable suggestions. I also want to thank Dr. Akhilesh Tyagi for being on my committee and willing to help.

I appreciate our sponsors, Rocketchips and TI, who offered me a terrific research project and supported my study here. I would also like to thank Dr. Bernard Grung, Raymond Johnson and Scott Irvin for reviewing and commenting on my thesis

Throughout my Master's program, I have been working with a friendly and helpful team of colleagues. I would like to thank Jing Cao, Yiqin Chen and Baiying Yu who shared the joys and pains with me as partners of those great courses. I would also like to thank Lin Wu, Jian Zhou for their discussion with me on this work, Satyaki Koneru for writing some C programs, Sudha Nagaravapu and Arathi Iyer for their contributions on the driver stage. I would also like to thank Maofeng Lan, Jie Yan and all my friends who made my academic life at ISU enjoyable and memorable.

Finally, I want to dedicate my deepest thanks to my family: my parents who always support, encourage me with their great love; my brothers, who I can always count on for help or suggestions.

#### ABSTRACT

With the fast growth of computer power and network services, high performance peripherals and interfaces are being developed to meet the system needs. High speed serial data communication devices are especially important and much effort has been spent to increase the bandwidth, reduce the cost, lower the power dissipation and improve the level of integration. Driven by these motivations, various commercial products with data rates ranging from several hundred Mbps to several Gbps have been developed in GaAs or Si bipolar technologies. Potentially less expensive silicon CMOS implementation of these functions are presently begin investigated and is the focus of this work.

The serial data link has been dominant in network connections, such as Ethernet. FDDI or ATM, and will find even wider application in the future. This work focuses on the implementation of some basic building blocks for very high speed serial data communication using CMOS technology. At the transmitter end, a novel parallel-toserial converter is presented, whose core is a 5 by 4 register matrix which combines the selection and shift schemes for conversion. Using this architecture, a pipelined data loading is applied to multiple data paths running at sub-multiple of the data rate. This significantly reduces the dynamic power dissipation and eases the system design. The special clock strategy to drive the converter is investigated in this work. SPICE simulations and a chip layout of this method are presented. Various design and layout considerations are also discussed. At the receiver end, a clock recovery circuit based on a charge pump phase locked loop is proposed. The VCO has an internal feedback loop to adjust the duty cycle at different frequency. A simple conventional phase detector with a limited output swing is used to achieve the high working frequency. A modified symmetrical charge pump circuit improves linearity of the output current versus the input UP and DOWN signal. Some characteristics such as impulse capture range and lock range are estimated from the simulations. All the designs assume 0.5  $\mu m$  single-poly triple-metal CMOS technology.

This work presents an analysis to the system as well as individual parts and explores the possibility of CMOS solutions for gigabit data communication. It could be used or adopted in future designs in similar applications.

#### 1 GENERAL INTRODUCTION

#### 1.1 Introduction

The growing demand for network services, such as electric mail, voice mail or file sharing, have strongly pushed the development of high speed data communication interfaces. Among them, the serial data link, due to its simple and efficient hardware configuration, has been dominant in various network connections. The transceiver, which is the core hardware for serial data link, has experienced a fast growth in recent years. Previous works achieving from several hundred Mbps to several Gbps, implemented in either GaAs or Si bipolar technology, have been reported. [2][12][13] Thanks to the ever shrinking feature size of CMOS technology, full CMOS solutions for high speed transceivers have become possible. [3][4][5][6] The benefits are higher level of integration, lower cost, lower power dissipation and compatible logic with further digital circuits. This work discusses the design of CMOS circuits for the parallel-to-serial converter and clock recovery used in the transceiver.

The function of a transceiver is to transmit the parallel data into a serial link and regenerate the parallel data from the serial link. A typical block diagram is shown in Figure 1.1.

The transmitted serial data is precisely timed by the on-chip generated clock which sets the data rate. At the receiver end, two fundamental functions should be realized: recover the clock from the data sequence and then regenerate the data by sampling at an optimum time. The major challenge for design is the required high frequency versus the limitation of available CMOS technology.



Figure 1.1 Topology of transceiver

#### 1.2 Thesis Organization

This thesis comprises two parts: (1) a parallel-to-serial converter and (2) a clock recovery circuit. Both parts are organized in the following order. First an introduction is given, with definition of terms if necessary. Then a brief review introduces some previous works and basic theories. Following the review is the analysis and design of the individual circuits. At the end of each part is a short conclusion. Finally a summary completes the entire thesis and suggests some future works.

# PART I A HIGH PERFORMANCE PARALLEL-TO-SERIAL CONVERTER

#### 2 INTRODUCTION

The parallel-to-serial converter is an important function block of a transmitter. Basically it latches the parallel input words and converts them into a serial data sequence, precisely timed by the transmit clock. Roughly the serial data rate equals the parallel word rate times the bit width. For most applications, especially the ultra high speed data links such as fibre channel and Gigabit Ethernet, a differential data stream is preferred since it helps to improve S/N ratio, depress noise caused by switching drivers and minimizes crosstalk. [24]

#### 3 LITERATURE REVIEW

This chapter first compares some popular schemes for the parallel-to-serial conversion. The advantages and drawbacks are discussed individually. As data (D) flip-flops conventionally serve as basic cells for the converter, some reported work on high frequency D flip-flops are also introduced.

#### 3.1 Schemes for Parallel-to-Serial Conversion

The most conventional implementation of a parallel-to-serial converter is a chain of D flip-flops which perform the conversion. [2][21] Figure 3.1 is a simplified diagram.



Figure 3.1 Parallel-to-serial converter (shift scheme)



Figure 3.2 Timing relationship (shift scheme)

Parallel level one latches can be added to sample and hold the data for the converter. The 2-to-1 multiplexers (MUX) control the input source for each D flip-flop, which can be either the parallel data from the level one latches or the shifted output from the previous stage. The timing relationship between the clocks for parallel loading and serial shifting should be precisely controlled. For instance, as illustrated in Figure 3.2, the shifting edge could be offset at the end of the loading period.

The shift scheme requires less clock signals, simple logic, which are preferable for high speed application. However, some inherently unavoidable drawbacks are: (1) All the D flip-flops in the chain work at the full data rate, hence the dynamic power dissipation is considerably large according to the many data transitions; (2) The single bit clock has to drive all the D flip-flops, in which case the load would be very large and brings difficulties for designing the clock driver; (3) The VCO generating the bit clock has to work at the data rate frequency, which is not easy to achieve at Gigabit level; and (4) High speed D flip-flops are absolutely necessary for this kind of configuration, which bring challenging to the design. Giving the above considerations, it's not preferable to use a pure shift scheme in Gbps application.

An alternative solution is the selection scheme. In this method, all the D flip-flops are put in parallel, with a switch added at each output. A bus connects the outputs of the switches together to serialize the data. This configuration can extend to multi-levels so that a tree-like data path may be formed. The block diagram is shown in Figure 3.3 and the timing diagram in Figure 3.4.



Figure 3.3 Parallel-to-serial converter (selection scheme)



Figure 3.4 Timing diagram (select scheme)

Also level one latches can be added prior to this converter. The multiplexer consists of the parallel switches which are controlled by multi-phase clocks. The parallel word can be loaded from the level one latches into the parallel D flip-flops, then the multiphase clocks will turn on the switches one by one to output the corresponding bit. This approach is also called time division multiplexing and finds popular application in high speed transceivers.[3][4][6] The loading clock and the selecting clock should be synchronized very carefully or the newly loading data could destroy the last output bit. An example is the clock signals arranged as in the Figure 3.4. This scheme requires more clock signals, more complicated logic, but the pay off is that: (1) The D flip-flops will work at the much lower parallel data rate and the constraints for its design can be loosened; (2) The VCO need not to work at the serial data rate, which also eases its design; and (3) Each switch is driven by one clock phase so that the total load is distributed equally to the multi-phase clocks. However, a pure selection scheme will require many clock phases for a wide word, hence a VCO comprising many stages is necessary. Also the multi-phase clocks, whose phase number should be equal to the word bit width, are not flexible for the variation of bit width.

#### 3.2 High speed D flip-flops

Though the conversion scheme could be different, the most common cell for a converter is the D flip-flop. As a very important building cell for digital circuits, the design and optimization of high speed D flip-flops has been continuously investigated. Among them the dynamic D flip-flops dominate in high frequency applications. One popular architecture is the Yuan-Svenssen True-Single-Phase-Clock (TSPC) D flip-flop with the following schematic shown in Figure 3.5.



Figure 3.5 TSPC DFF

The major advantage of this D flip-flop is that it only needs one clock signal, versus conventional D flip-flops which require complementary clocks to drive the master and slave stage respectively. Therefore, the TSPC strategy inherently avoids the clock skew except for delay problems, hence can work at a higher frequency. However, this type of D flip-flop has the drawback of being sensitive to the slope of the clock, iiii in which case the output data may be destroyed by a poor triggering clock edge. An improved version of the Yuan-Svesson D flip-flop achieving higher frequency was reported by Qiuting Huang and Robert Rogenmoser in 1996. But the sensitivity to clock slope still remained a problem, which could be even worse for ultra-high speed data process.

#### 4 A NOVEL PARALLEL-TO-SERIAL CONVERTER

This chapter describes a 20/10 bit wide parallel-to-serial converter which incorporates a D flip-flop matrix. A top-down analysis and design is presented. Some supporting logic blocks are introduced when necessary. SPICE simulations on the schematic level follows the design showing some characteristics. Finally several layout considerations are discussed.

#### 4.1 Converter Core -- A 5×4 D Flip-Flop Matrix

Two popular conversion schemes, selection and shift, were discussed in the previous chapter. As mentioned earlier, the ultra high data rate challenges the shift scheme, and the 20/10 bit dual mode makes the generation of variable-phase clocks more complicated, so the solely shifting or selective scheme might not serve well for our application. However, a combination of the two scheme may be a suitable solution. Here we propose a novel architecture for parallel-to-serial conversion. The core circuit is a 5 row by 4 column D flip-flop matrix. A simplified block diagram of the converter is shown in Figure 4.1.

Note that the left most DFF's on each row can be omitted if the level one latches are replaced by D flip-flops, and this is what we have done in our circuits.

The basic idea behind the design is that, for a convenient switch between 20 bit mode and 10 bit mode, we need not change the phase number of the clocks which control the output switches. Instead, we can adjust the frequency of the loading clock which controls the 15 MUX's. In 20 bit mode, we let all four serial bits on each data path be shifted out, while in 10 bit mode, we only shift out the two right most bits and overload the other two by newly loaded data. Thus the two modes of 20/10 bit can be easily switched by selecting the loading clock. The 5 data paths each work at 200 MHz

which eases the design of the VCO. The output switches are controlled by the 5 phase clock, or the 5 time-division consecutive pulses. Each pulse has a width of 1 ns and a period of 5 ns. The timing of the 5 phase pulses are shown in Figure 4.2. Thus by turning on the 5 switches cyclically, an equivalent data rate of 1.0625 Gbps can be obtained on the bus. In this case, the components running at full data rate are only the MUX's and the output driver. Also the lower frequency on each path means less data transitions hence the dynamic power dissipation is reduced by a factor of 5.

Another advantage of such scheme is that a pipelined data loading can be applied, which requires a set of carefully arranged clocks. Figure 4.2 shows the clock set we used in our design.



Figure 4.1 Proposed parallel-to-serial converter



Figure 4.2 Clocking strategy for proposed converter

The MUX's in Figure 4.1 comprise only two NMOS's instead of complementary transmission gates for it's simpler control clock. Though the signal after the MUX is not full CMOS logic, it can be recovered by the following D flip-flop. LDCLK and its complementary LDCLKbar (not shown) control the MUX between each D flip-flop. While LDCLK is low, the parallel loaded data passes the MUX and reaches the input of the D flip-flops. While LDCLKbar is low, the date shifted by the previous D flip-flop goes through the MUX to the input of the next D flip-flop. To clarify the working scheme, let's assume that the D flip-flop is negative edge triggered. If we apply SHIFTCLK1 to the lowest two data paths, we can see that the negative edge of SHIFTCLK1 leads the phase0 and phase1 pulses by a certain time interval, which guarantees the data to be stable at the output of D flip-flop before it's selected to the bus. Similarly, we apply SHIFTCLK2 to the middle data path and SHIFTCLK3 to the top two data paths, thus each shifting clock can lead the corresponding pulses by a preset time interval and the reliability of this converter is improved.

#### 4.2 Clocking Strategy

The clock strategy for the matrix converter seems complicated, but given a 5 stage differential ring oscillator, we can generate all the clock signals without difficulties. For example, if we have one ring oscillator as in Figure 4.3.

The 5 phase pulses can be generated simply by sending appropriate clock signals from the ring oscillator to a NAND gate so that the 1 ns wide negative pulses are generated. The circuit we used is shown in Figure 4.4 which along with the VCO were designed by the PLL group.



Figure 4.3 5 Stage VCO and 10-phase clocks



Figure 4.4 Generation of 5-phase pulses



Figure 4.5 Generation of LDCLK and LDCLKbar

The shifting clocks can be chosen from the ten phase clocks generated by the ring oscillator, but they should be well synchronized with the 50 MHz or 100 MHz loading clock. The loading clock LDCLK can be generated by the logic shown in Figure 4.5.

The input is a 100 MHz clock which can be obtained by dividing one of the 10 phase clocks or the 5 phase pulses. Inside the block, there are two signal paths, one in which the 100 MHz is further divided to generate the 50 MHz clock, the other is simply some inverter stages to compensate the delay introduced by the divider. To get the 50 MHz loading clock with 75% duty cycle (shown in Figure 4.2), the 100 MHz and 50 MHz clock both go to an OR gate and the output will be the expected 50 MHz loading clock. The top OR gate is added to cancel the delay so that the negative edge of the two loading clocks are aligned as well as possible. Then they go to a 2-to-1 MUX which is controlled by the MODE signal hence either the 50 MHz or the 100 MHz clock passes depending on the working mode.

To correctly synchronize the loading clock, shift clock and switch pulses, we need to start from one signal as a reference. In this design, we chose the pulse0 as the reference signal, which controls the bit0, bit5, bit10 and bit15. At the position of pulse0, by subtracting a small but sufficient time interval, we can get the position of SHIFTCLK1 which controls the lowest two data paths. Then by adding 2 ns and 3 ns delay, the positions of SHIFTCLK2 and SHIFTCLK3 are determined. At the negative edge of SHIFTCLK1, by subtracting another appropriate time interval, we can get the position of the negative edge for both loading clocks. The minimum interval between each adjacent clocks from the ring oscillator is 5 ns/10 = 0.5 ns and this is the minimum step we can adjust the position of the loading or shift clocks. From simulation and

observation, we chose clk1, clk5 and clk2bar as SHIFTCLK1, SHIFTCLK2 and SHIFTCLK3 respectively. The pulse0 is divided by 2 as the reference clock to generate the loading clock.

#### 4.3 Implementation of the Complete Architecture

The complete converter comprises a first level register file which is 20 bit wide, the 5×4 D flip-flop matrix and the supporting logic for generating the necessary clock signals. A block diagram is shown in Figure 4.6.



Figure 4.6 Block diagram of the converter

#### 4.4 A High-Speed Dynamic D Flip-Flop

To ensure high performance from the parallel-to-serial converter, a high speed reliable D flip-flop is the key point. Up to now, various type of TSPC D flip-flops have been reported. Among them is the one proposed by Byungsoo Chang, Joonbae Park and Wonchan Kim shown in Figure 4.7.



Figure 4.7 A ratioed TSPC DFF

This D flip-flop was designed using ratioed logic. In the second and last stages, the drains of MN2 and MN3 will be pulled low as long as a "high" signal is applied on their gates, even though the gates of MP2 and MP3 may be pulled low. The price paid here is some static power dissipation depending on the data input. But as an exchange, this architecture saves two middle transistors in those two stages, compared to the Yuan-Svenssen TSPC DFF. By eliminating the series resistance associated with series MOS transistors, this design inherently minimizes the time constant for charging and discharging the load capacitance and hence reduces the delay, rise and fall time of the output data. In high speed applications, normally the delay, rise and fall time set the upper limit of working frequency. Here we do not need the D flip-flop to have a strong driving capability, so the transistor sizes are all small, and this helps reduce power dissipation. A SPICE simulation was done on the following configuration and some characteristics of interest are measured based on the simulation.



Figure 4.8 A pseudo random data generator

The size of each transistor is listed in the Table 4.1. MN2 and MN3 are comparatively larger than MP2 and MP3, which is to ensure the ratioed logic works well under different environment and temperature.

The simulation was done at temperatures of 25 °C and 85 °C using the circuit shown in Figure 4.8. Some measured results from the simulations are listed in Table 4.2.

Table 4.1 Transistor size of the D flip-flop

|       | MP1     | MP2     | MP3   | MPbuf | MN0     | MN1     | MN2   | MN3     | MNbuf   |
|-------|---------|---------|-------|-------|---------|---------|-------|---------|---------|
| W/L(µ | 2.4/0.6 | 1.8/0.6 | 3/0.6 | 9/0.6 | 2.4/0.6 | 2.4/0.6 | 3/0.6 | 3.6/0.6 | 5.4/0.6 |

Table 4.2 Simulated performance of the D flip-flop

|                   | 25°C                 | 85°C                 |
|-------------------|----------------------|----------------------|
| Delay             | 290ps for "0" to "1" | 380ps for "0" to "1" |
|                   | 250ps for "1' to "0" | 320ps for "1" to "0" |
| Rise Time         | 150ps                | 200ps                |
| Fall Time         | 140ps                | 170ps                |
| Power dissipation | 1mW                  | 0.82mW               |

The delay, rise and fall time are small compared with the time intervals between loading clock and shift clock. This ensures the data will be valid at the output before further processing. Also it provides room for going to a higher data rate.

#### 4.5 The Driver Stage

Since one of the design goal is accurate latency control, we'd like to reduce as much as possible the unpredictable delay after the switch, which is the last stage controlled by clock. By including the driver stage, the unpredictable delay due to the driver can be compensated by the phase locked loop. For example, we can lock one pulse with the reference clock so that the delay between the loading clock and output pulse is predetermined. Thus the only uncontrolled delay is that of the pseudo-emitter-coupled-logic (PECL) line driver. Complementary outputs are required for the driver stage to

drive the differential PECL driver. Since its input swing needs not to be rail to rail, we can use only PMOS as the switches which only require negative pulses to control.

The driver stages are simply a chain of inverters, designed by the line driver group. A pair of NOR and NAND are added at the beginning to allow some external OE signal to put the output into a known state. The schematic is shown in Figure 4.9.

The complete parallel-to-serial converter including the driver stage is shown in Figure 4.10.

#### 4.6 SPICE Simulations

SPICE simulations were done on the converter combined with the PLL block. The schematic for simulation is shown in Figure 4.11 and Figure 4.12.



Figure 4.9 Driver stage with ENABLE control



Figure 4.10 Block diagram for the complete converter



Figure 4.11 Top level schematic of the transmitter



Figure 4.12 Schematic of the complete parallel-to-serial converter

The pseudo random data generator is just a signal source with full CMOS logic level. Figure 4.13 shows the timing relationship between the loading clock, shift clocks and pulses, on which the pipelined working scheme is applied. Figure 4.14. shows the differential serial output on the buses, whose signal swing is from 1.3V to 3.3V and matches the input of the differential line driver.



Figure 4.13 Timing of clocks measured from simulation



Figure 4.14 Serial output waveforms

Table 4.3 Simulated performance of the proposed converter

| Power         | 340mW                                |
|---------------|--------------------------------------|
| Dissipation   |                                      |
| Bit Latency   | 4 ns                                 |
| Rise Time     | 0.5 ns                               |
| Fall Time     | 0.5 ns                               |
| Voltage Swing | 1.2 V-3.2 V (interlaced "0" and "1") |
|               | 1.0 V-3.3 V (consecutive "0" or "1") |

A simulation using a temperature of 85 °C was also done and similar results was gotten. Some characteristics of interests (at 25 °C) are listed in Table 4.3.

#### 4.7 Layout Considerations

On each datapath, four serial D flop-flops are triggered by one single clock to perform the shift operation. During the shifting, to ensure the earlier D flip-flops are not overloaded by the later stages, we want the triggering order in the direction of from the last to the first, which can be realized by routing the clock line in the direction of from the last to the first flip-flop, as shown in Figure 4.15.

For the big inverters and PMOS MUX's, the RC delay associated with the polysilicon cannot be neglected in this high speed application. An approximation model is shown in Figure 4.16.



Figure 4.15 Clock path for serial shifting cells



Figure 4.16 Model of gate RC delay

To reduce this RC delay, we lay a strip of metal 1 above the polysilicon and put polysilicon contacts along it as shown above. Also in the direction of charging current flow, we can shrink the polysilicon width gradually since the averaging current is reduced. This keeps the parasitic capacitance associated with poly and metal1 smaller. A section of layout is shown in Figure 4.17.



Figure 4.17 Layout of one big inverter

The entire layout of the parallel-to-serial converter is shown in Figure 4.18. The 20/10 bit inputs go into the cell from the top, down through 5 datapaths of 4 serial D flip-flops, then turn around and go up passing the differential driver stage to reach the switches. The outputs after the switches and power supply were using very wide metals to allow large peak currents. The complete transmitter chip layout, which includes the PLL, the converter and the line driver, is shown in Figure 4.19.



Figure 4.18 Layout of the converter including the driver stage



Figure 4.19 Layout of the complete transmitter chip

#### 5 SUMMARY AND CONCLUSION

Part I described a novel architecture of a parallel-to-serial converter realized by a D flip-flop matrix core. The proposed converting scheme eases the VCO design by having it run at a sub-multiple of the serial data rate. Also the generation of the multiphase clocks is simplified. By selecting the loading clock, the converter can be easily switched between 10 bit and 20 bit working mode. The piplelined data loading and shifting, which give extra time interval for data to be settled, improve the signal stability. In addition, the total load is distributed to multiple clocks as well. The D flip-flops are all of small size so that the power dissipation is kept to minimum.

In this part, some layout considerations were also discussed, which could be very important in high speed applications.

The SPICE simulation showed that this circuit meets the design goals, including a data rate of 1Gbps, a bit latency of 4 ns and a power dissipation of 340 mW including the PECL driver.

## PART II A CHARGE-PUMP CLOCK RECOVERY CIRCUIT FOR NRZ DATA REGENERATION

#### 6 INTRODUCTION

#### 6.1 Introduction

In most serial data communication systems, such as Ethernet, ANSI Fibre Channel standard, data are transmitted without any specific timing reference. However the receiver eventually must regenerate and process the data synchronously. Thus, it is required for the receiver to extract the clock signal from the data stream and this function is done by a clock recovery circuit.

There are different approaches to regenerate the clock, and the most popular one employs on-chip phase-locked loops (PLL). With no external tuning elements, the signal is processed entirely on-chip. In addition, the tuning range of the local voltage controlled oscillator (VCO) can provide some compensation for the drifts introduced by the process and temperature variations. [10]

The architecture of PLL clock recovery circuit depends on the encoding scheme of the transmitted data, such as nonreturn-to-zero (NRZ) and return-to-zero (RZ) which is also called Manchester encoding. However the charge-pump PLL is the most widely used for its extended frequency range and low cost.

## 6.2 Definition of Terms

#### NRZ Data

NRZ data is a stream of rectangular pulses generated by NRZ encoding scheme for the digital data communication. The presence of pulse represents binary one and the absence represents zero. This scheme is widely used in fibre-optical channels to get the highest throughput from a given channel bandwidth. An example waveform is shown in Figure 6.1.



Figure 6.1 NRZ data

# **Tuning Range**

The frequency range between the maximum and minimum value which can be obtained from the VCO is called the tuning range. It should accommodate the PLL input frequency by some margin. Normally the tuning range is about 20% of the free-running frequency.<sup>[8]</sup>

## **Jitter**

Variations of clock period, including random and deterministic ones, are referred to as jitter.

The upper bound on the VCO jitter and phase noise is imposed by the timing accuracy and spectral purity requirements of the PLL application.

#### Capture Range

The maximum difference between the input signal frequency and the oscillator's free-running frequency where lock can eventually be attained is called the capture range.<sup>[21]</sup>

## **Acquisition Time**

The elapsed time for a PLL to lock on the input signal while its frequency is within the capture range is called the acquisition time. [21] It was shown by Blanchard that the acquisition time can be approximated as:

$$t_{acq} = \zeta \frac{(\omega_{in} - \omega_{osc})^2}{\omega_n}$$

where  $\zeta$  is the damping factor,  $\omega_n$  is the loop bandwidth. [21]

# Lock Range

Once lock is attained, the maximum frequency that the PLL can still remain in lock when the input signal frequency changes very slowly is the lock range.

# 7 LITERATURE REVIEW

PLL clock recovery circuit is a key building block for the receiver in serial data link and has been intensively studied. With the strong demand of network services, data rate for serial transmission is going up to Gigabit per second (Gbps) level. To achieve this ultra high speed, traditionally compound semiconductor and Si bipolar technologies were widely used. For example, a GaAs IC combined with Si bipolar clock recovery chip achieving 2.5 Gbps were reported by R. Walker et al.. [23] A monolithic Si bipolar implementation running at 1.5 Gbps were proposed by H. Ransijin and P. O'Connor. [22] Both of them have power dissipation of up to 1 W, which is considerably high for integrated circuits. At present, a line of GaAs Gigabit transceivers from VITESSE Inc. are available on the market. Meanwhile, with the fast advance of CMOS technology and continuously shrinking of the feature size, some CMOS implementations of Gbps clock recovery circuit (CRC) have been reported. The most recent ones are a fibre channel compliant 1.0625 Gbps transceiver and a 1.25 Gbps transceiver for Gigabit Ethernet. [5][6] Both of them were using 0.5 µm technology and consume less than 500mW with 3.3V power supply. Compared with the compound semiconductor or Si bipolar solutions, CMOS has the advantages of higher level of integration, lower cost, lower power dissipation and compatible logic level with the digital circuits. It is clear that the CMOS solution for Gbps data communications is becoming the trend for the future.

## 8 BASIC THEORIES

This chapter will discuss some basic theories about phase locked loop (PLL) and clock recovery circuits. First a generic PLL topology and principle is introduced, then a charge-pump PLL is analyzed. A small signal linear approximation is introduced to model the PLL system to gain some basic idea about its design. Finally some special requirements on clock recovery from NRZ data are discussed.

## 8.1 Topology and Working Scheme

A phase locked loop is a feedback system in which the feedback signal is derived from the input phase error. Basically, a PLL consists of a phase detector(PD), a low-pass filter (LPF) and a voltage controlled oscillator (VCO). A simple configuration is shown in Figure 8.1.



Figure 8.1 Basic phase-locked loop

The phase detector has two inputs, one is an external signal, i.e. the reference clock or the incoming data, and the other is the VCO output. The phase difference of these two signals is detected and output by the phase detector. This error signal, normally converted to a voltage level, is passed through the low-pass filter, after which the dc component is passed to the VCO and adjusts the VCO frequency to minimize the phase error. When the lock is attained, all the signals in the loop are stable. However, the phase difference is not required to be zero. Instead, it may be some static value so that

a constant dc component is produced to keep the VCO running at a constant frequency which furthermore maintain the constant phase error.

#### 8.2 Loop Dynamic in Locked State

The transient response for tracking the input signal is generally a nonlinear process and hard to model. However, by assuming small drift in locked state, a linear approximation analysis can be applied to gain some basic ideas about the process. Figure 8.2 shows such a model.



Figure 8.2 Linear model of a PLL

The open loop transfer function, termed by  $\Phi_{out}(s)/\Phi_{in}(s)$ , is

$$\frac{\Phi_{out}(s)}{\Phi_{in}(s)} = H_o(s) = K_{PD}G_{LPF}(s) \frac{K_{vco}}{s}$$
 (8.1)

where:

 $K_{PD}$  is the gain constant of the phase detector, in V/rads

 $G_{LPF}(s)$  is the transfer function of the low-pass filter, in V/V

 $\frac{K_{vco}}{s}$  is the transfer function of the VCO, in rads/V·s

By letting  $\Phi_{out} = K_{PD}G_{LPF}(s) \frac{K_{VCO}}{s} (\Phi_{out}(s) - \Phi_{in}(s))$ , a closed loop transfer function can be derived as:

$$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{K_{PD} K_{VCO} G_{LPF}(s)}{s + K_{PD} K_{VCO} G_{LPF}(s)}$$
(8.2)

If a low-pass filter is realized by a resistor in series with a capacitor, whose transfer function is:

$$G_{LPF} = \frac{1}{1 + \frac{1}{\omega_{LPF}}}$$
 (8.3)

where

$$\omega_{LPF} = \frac{1}{RC}$$

By substituting  $G_{GLPF}$ , we get

$$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{K_{PD}K_{VCO}}{\frac{s^2}{\omega_{LPF}} + s + K_{PD}K_{VCO}}$$
(8.4)

indicating that the system is of second order, and  $K = K_{PD}K_{VCO}$  is the loop gain.

The above equation can be further written as

$$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{{\omega_n}^2}{s + 2\zeta\omega_n + {\omega_n}^2}$$
(8.5)

$$\omega_n = \sqrt{\omega_{LPF}K} \tag{8.6}$$

$$\zeta = \frac{1}{2} \sqrt{\frac{\omega_{LPF}}{K}} \tag{8.7}$$

Here,  $\omega_n$  is the geometric -3dB bandwidth of the LPF and the loop gain, in a sense indicating the gain-bandwidth product of the loop. Also the damping factor  $\zeta$  is inversely proportional to the square root of the loop gain. To provide an optimally flat frequency response,  $\zeta$  is preferably equal to 0.707.

The most common PD in this architecture is either the Gilbert Multiplier or simply an XOR gate. Unfortunately, these type of PD's have a potential problem that the PLL may be locked at the harmonic or sub-harmonic of the input signal. However, this generic architecture will be the basis for our further analysis in the next section.

#### 8.3 Charge-Pump PLL

In digital communications, one of the most popular PLL architectures is the charge-pump PLL. Compared with the generic PLL discussed in the previous section, a

charge-pump PLL consists of a phase/frequency detector (PFD) or a phase detector (PD) plus a charge pump circuit which will charge or discharge the LPF. A typical architecture is shown in Figure 8.3.<sup>[8]</sup>

There are some desirable features associated with charge-pump PLLs. First, they do not exhibit false lock. Second, if neglecting offsets and mismatches, the static phase error will be zero while the system is in lock. However, the above circuit has a potential problem of instability. As we can see, the transfer function for the PD is  $K_{PD}$  / s, hence the closed loop transfer function is

$$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{K_{PD}K_{VCO}}{s^2 + K_{PD}K_{VCO}}$$
(8.8)

which has two imaginary poles at  $\pm j\sqrt{K_{PD}K_{VCO}}$ . To keep the system stable, a zero can be added to the open-loop transfer function by placing a resistor in series with the charge-pump capacitor.



Figure 8.3 Charge-pump PLL

The switching operation of the charge pump makes the PLL a discrete-time system. However, by assuming that the loop bandwidth is much less than the input frequency which is normally met in practical circuits, we can use the average value of the discrete-time parameters to perform a continuous-time small signal analysis.

Suppose the phase error is  $\Phi_e = \Phi_{in} - \Phi_{out}$ , the average voltage change on the charge pump capacitor is:

$$V_c(s) = \frac{I_{CP}\Phi_e}{2\pi}(R + \frac{1}{sC_p})$$
 (8.9)

where  $I_{CP}$  is the output current of charge pump.

By substituting  $\Phi_{out} = V_c(s) K_{VCO} / s$  into Equation 8.1, we obtain the closed-loop transfer function for charge pump PLL as:

$$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{\frac{I_{Cp}}{2\pi C_p} (sRC_p + 1) K_{VCO}}{s^2 + (\frac{1}{2\pi} K_{VCO} R) s + \frac{I_{Cp}}{2\pi C_p} K_{VCO}}$$
(8.10)

Thus, the system has a zero of  $\omega_Z = -1/(RC_P)$ , and gain

$$\omega_n = \sqrt{\frac{I_{CP}}{2\pi C_P} K_{VCO}} \tag{8.11}$$

$$\zeta = \frac{R}{2} \sqrt{\frac{I_{CP}}{2\pi} K_{VCO}} \tag{8.12}$$

The above expressions indicate that the factor  $\omega_n$  and  $\zeta$  can be increased simultaneously, which is desirable for maximizing the loop bandwidth. This is another advantage of charge pump PLL.

But by adding a resistor to keep the loop stable, a ripple in the control voltage is also introduced. To suppress this ripple, a second capacitor could be connected in parallel with the RC branch, as shown in Figure 8.4. Normally this capacitor is chosen as 1/10 or less of Cp so that it has little effect on the loop dynamics.<sup>[8]</sup>



Figure 8.4 Loop filter with zero and ripple suppression

#### 8.4 Clock Recovery from NRZ Data

NRZ data has two attributes that make clock recovery difficult. First, it can exhibit consecutive ONE's or ZERO's, which requires the clock recovery circuit to maintain a constant clock frequency with the absence of data transitions. To minimize the frequency drift, a narrower bandwidth is preferable for the loop filter. Unfortunately, to suppress the jitter introduced by the VCO, a larger loop filter bandwidth is preferred. Regarding these two aspects, a trade-off is necessary. Second, NRZ data has no spectral energy at the baud rate, so it usually undergoes a nonlinear processing before reaching the PLL to create a frequency component at the baud rate. A common approach is to detect each data transition and generate corresponding pulses, for instance, using an XOR gate or a double-edge triggered flip-flop. A frequency detector should be used carefully in clock recovery circuits, since it will give incorrect output signals when data transitions are missing. But by clever design, a frequency detector can speed up the acquisition process when acquisition time is important. One approach is to first lock the VCO with an external reference clock using the frequency detector, then switch to a phase detector to lock the VCO with the data sequence.

# 9 DESIGN OF A CHARGE-PUMP CLOCK RECOVERY CIRCUIT

This chapter deals with the design considerations for the proposed charge-pump clock recovery circuit, based on the theoretical analysis of the previous chapter. As discussed before, a charge-pump PLL consists of a VCO, phase detector, charge pump and a loop filter. First each of the components is designed, simulated and characterized separately, then a SPICE simulation is done on the system level. Finally some important characteristics are estimated from the simulation.

#### 9.1 Voltage Controlled Oscillator (VCO)

The VCO is an essential part in PLLs or CRCs, especially for high speed applications. Among various types of CMOS VCOs, ring oscillators are commonly used. A typical ring oscillator schematic is shown in Figure 9.1.



Figure 9.1 A simple topology of ring oscillator

Odd number M inversion gain stages are connected in series and the last output is fedback to the input. While oscillating, the total phase shift is zero and the loop gain is unity. <sup>[8]</sup> It can be easily derived that the oscillation frequency  $f_o = 2MT_d$  where  $T_d$  is the delay of each stage with a fan out of one.

In the design of a VCO, the following parameters are of interest: (1) Tuning Range; (2) Linearity of the VCO transfer function; (3) Duty cycle; (4) Jitter and Phase Noise; (5) Supply and substrate noise rejection.

In the present application, a 50% duty cycle is critical to get the best performance of the phase detector. We will come to this issue below. Also, inside the mixed-signal transceiver, the digital part will inject noise into the common substrate, hence a highly noise immune VCO is desired. [8]

With the above goals in mind, we designed a VCO having a fully differential signal path and an internal feedback loop to adjust the duty cycle to 50% at different running frequencies. The schematic is shown in Figure 9.2.



Figure 9.2 Designed ring oscillator with self-adjusted 50% duty-cycle

The ring oscillator consists of four gain stages. Each stage has a pair of sinusoid-like signal outputs ( $V_0$ + and  $V_0$ -) to drive the next state and a pair of digital clock outputs (CLK+ and CLK-) to drive the digital circuits on chip. A differential opamp is added to detect the common-mode of the differential digital clocks, and the output is sent back to each stage to adjust the duty cycle. Ideally, with a 50% duty cycle, the common-mode level is at half of the power supply, so the reference input is set to 1.65 V. Two compensation capacitors are added at the other input and the output node to improve the stability of the loop.

The device sizes are listed in Table 9.1.

Table 9.1 Transistor sizes of the feedback opamp

| R    | I <sub>TAIL</sub> | Mpdif1   | MPdif2   | MNld1    | MN1d2    | C1  | C2   |
|------|-------------------|----------|----------|----------|----------|-----|------|
| 10ΚΩ | 8μΑ               | 24μ/1.2μ | 24μ/1.2μ | 12μ/1.2μ | 12μ/1.2μ | 5pF | 10pF |



Figure 9.3 Frequency response of the feedback amplifier



Figure 9.4 Gain stage of the ring oscillator

Figure 9.3 shows the frequency response of the first stage amplifier. The simulated dc gain of the feedback amplifier is between 27-30 dB.

Inside each gain stage, the circuit can be divided into three parts, as shown in Figure 9.4. The first one is the differential delay cell with resistive loads of two PMOS transistors biased into the triode region. By adjusting their on-resistance, the RC time constants and the output voltage swings are changed hence the delay of each cell varies. The advantage of this control scheme is that: the on-resistance has a better

linearity with the control voltage in triode region than in the saturation region. The additional two diode-connected PMOS's help to keep the tail transistor in saturation region during the whole switching cycle by preventing their drain voltages from going too low.

The second part is the differential-to-single-end converter which also isolates the delay cell from the duty cycle feedback loop. The differential outputs of the delay cell, which have a frequency-dependent voltage swing from Vdd to about 1.5 V~1.8 V, are sent to two PMOS transistors - MPdif1 and MPdif2. With MPdif1 and MPdif2 switching between on and off, a single-ended output is generated.

The third part is the buffer stage consisting of ratioed inverters in series to enhance the driving capability of the digital clocks.

As we need a differential clock to drive the phase detector, the second part and third part are duplicated but their inputs are swapped. These two separate clock paths could introduce jitter due to their mismatch, so a careful layout is absolutely necessary to minimize this effect.

One thing notable is that an extra NMOS transistor is added at the converter output to control the duty cycle of the digital clock. As mentioned before, the differential outputs of the delay cell have a variable voltage swing, hence affect the output range of the converter, which further results in a frequency-dependent duty cycle of the buffered clock. To suppress the duty cycle drift, the added NMOS transistor basically keeps the converter output swing relatively constant with the gate voltage controlled by the feedback opamp.

The device sizes are listed in Table 9.2.

MPsat1 MPbia1 Mpbia2 MNdif1 MNdif2 Mpsat2  $12\mu/0.6\mu$  $12\mu/0.6\mu$  $28.8 \mu / 0.6 \mu$  $28.8 \mu / 0.6 \mu$  $30\mu/0.6\mu$  $30\mu/0.6\mu$ MPdif2 MNld1 MNtai1 Mpdif1 MNld2 MNduty  $24 \mu / 0.6 \mu$  $7.2 \mu / 0.6 \mu$  $132\mu/1.2\mu$  $24 \mu / 0.6 \mu$  $7.2\mu/0.6\mu$  $12\mu/0.6\mu$ 

Table 9.2 Transistor sizes of the VCO gain stage

Table 9.3 Voltage-frequency transfer function of the VCO at 25  $^{\circ}\text{C}$ 

| V <sub>CTRL</sub> (V) | 0.8  | 0.85 | 0.9  | 0.95 | 1.0  | 1.05 | 1.1  | 1.15 |
|-----------------------|------|------|------|------|------|------|------|------|
| f (MHz)               | 1390 | 1335 | 1280 | 1225 | 1180 | 1140 | 1096 | 1050 |
| $V_{CTRL}(V)$         | 1.2  | 1.25 | 1.3  | 1.35 | 1.4  | 1.45 | 1.5  |      |
| f (MH2)               | 1005 | 965  | 925  | 895  | 860  | 835  | 810  |      |



Figure 9.5 Transfer function of designed ring oscillator



Figure 9.6 Clock waveform at 25 °C



Figure 9.7 Clock waveform at 60 °C

Based on simulation, we estimated the gain  $K_{VCO}$  of the VCO as  $2\pi \times 1GHz/V$  at 25 °C and as  $2\pi \times 800MHz/V$  at 60 °C. Figure 9.6 and Figure 9.7 show the simulated results of the feedback control voltage and clock waveform.

## 9.2 Phase Detector

The Phase Detector can be realized by different approaches, such as Gilbert multiplier, XOR gate or some circuits involving registers. The specific interest of a phase detector is the linearity between the output and input phase difference. For charge pump PLLs or CRCs, the phase detector has three working mode which are listed in Table 9.4.

The phase detector we used is a self-centered type, which was first reported by Hogges. <sup>[9]</sup> The schematic and a timing diagram for an arbitrary data stream is shown in Figure 9.8 and Figure 9.9.

Table 9.4 Phase dector working states

Up

|                        | Up | Down |
|------------------------|----|------|
| Clock Lags Data        | 1  | 0    |
| Clock Leads Data       | 0  | 1    |
| Clock Locked With Data | 0  | 0    |



Figure 9.8 Phase detector used in the CRC

The input data is sampled at one clock edge, i.e. falling edge, by the first D flip-flop and sampled at the other clock edge by the second D flip-flop. The data at nodes 1, 2, 3 are sent to two XOR gates to generate the UP and DOWN signal which control the charge pump. When there is a data transition, two pulses are generated. It can be observed that the UP pulse is always of half bit period width, and the DOWN pulse width depends on the phase error of the clock falling edge and the data transition. If the CRC loop is locked with the data, the control voltage to the VCO should be constant. This requires the DC component of UP and DOWN to be the same, which can be realized by aligning the clock edge at the center of the data period, as shown in Figure 9.9.

Since there's only half a clock period for the data to be settled and sampled at the second D flip-flop, a very high speed D flip-flop is needed. The D flip-flop used in the parallel-to-serial converter is not suitable since there are three stages of delay between the triggering edge and the Q output. Here we go to the simplest solution -- a complementary transmission gate followed by an inverter serves as both master and slave stage. Usually the complementary clock is not preferable in other applications, but, since our VCO can provide fully in-phase complementary clocks, we can take advantage of it.

The schematic of the D flip-flop is shown in Figure 9.10 and the device sizes are listed in Table 9.5. Table 9.6 shows the SPICE simulation results.



Figure 9.9 Signal waveform of the phase detector while in clock



Figure 9.10 D flip-flop used in the phase detector

Table 9.5 Transistor sizes of DFF1 and DFF2

| DFF1 | MPtg     | MNtg     | MPinv    | MNinv    |
|------|----------|----------|----------|----------|
| 3    | 18μ/0.6μ | 18μ/0.6μ | 36μ/0.6μ | 24μ/0.6μ |
| DFF2 | MPtg     | MNtg     | MPinv    | MNinv    |
|      | 9μ/0.6μ  | 9μ/0.6μ  | 18μ/0.6μ | 12μ/0.6μ |

Table 9.6 Timing characteristics of the DFF

| Power Dissipation | 3mW                  |  |  |
|-------------------|----------------------|--|--|
| Delay             | 150ps for "0" to "1" |  |  |
| !                 | 160ps for "1" to "0" |  |  |
| Rising Time       | 140ps                |  |  |
| Falling Time      | 130ps                |  |  |

The XOR gate is realize by transmission gate logic, which can work at higher frequency than standard CMOS logic. The schematic is shown in Figure 9.11.

A minor modification is made here. Instead of using complementary transmission gates, we used simple NMOS transistors as the transmission gates to limit the output voltage swing, which further reduces the dead zone of the charge pump circuit. This will be described later in the design of the charge pump.



Figure 9.11 XOR gate used in the phase detector

As the phase detector works at very high frequency, the gate delay cannot be neglected for a precise phase detection. So in Figure 9.8, a buffer X1 is added to node 1 to cancel the delay between the triggering clock edge and the D flip-flop output. Though the outputs of the phase detector are two pulses, we can still characterize the dc component difference between them.

## 9.3 Charge Pump and Loop Filter

The charge pump circuit is basically a voltage-controlled current source which has three working modes according to the working states of the phase detector: (1) when UP is high, charge pump sources charging current to the loop filter; (2) when DOWN is high, charge pump sinks discharging current from the loop filter; and (3) when both UP and DOWN are low, charge pump leaves alone the loop filter and the voltage on the loop filter is maintained.

In our application, the charge pump is a modified version previously proposed as shown in Figure 9.12.



Figure 9.12 A previously reported charge pump circuit

The reported charge pump controls the output current by changing the gate voltage of the current mirror MPout and MNout. Compared to the topology in Figure 8.3, this scheme eliminates the fully on/off states of the switches on the current path hence brings a higher speed and better matching of charge and discharge currents. However, the two input transistors are PMOS and NMOS respectively, which require both positive and negative pulses. Due to the electronic mismatch between PMOS and NMOS, there will be some offset between the UP and DOWN pulse width for a zero output. At a very high speed, that offset will lead to an intolerable static phase error. More important, since the UP and DOWN signals from the phase detector are both positive pulses whose width is dominated by the rising and falling time, it is almost impossible to invert the pulse with a precise unity gain. So this charge pump cannot be used in our PLL. One solution is to change the PMOS input into NMOS input by using a current mirror. The modified schematic is shown in Figure 9.13.



Figure 9.13 Modified charge pump with two NMOS inputs

The DOWN signal, which is a pulse applied on MNdw, will pull down the gate voltage of MN4, hence bring down the current of MN4 and MP4. Through mirror of MP4 and MPout, the drain current of MPout also drops hence the MNout will sink discharging current from the loop filter. Similar scheme applied to UP signal except for there is no current mirror in the signal path.

When determining the size of each transistor, some important considerations should be noted. The voltage swing at the gate of MN4 and MNout are critical to the performance of the charge pump. For instance, consider the following case in which a positive pulse is strong enough to drive the gate voltage of MN4 or MNout to ground and they are fully cut off. When the pulse disappears, some extra time is needed for the gate voltage to recover from ground to the threshold voltage, hence a dead zone is introduced to the charge pump. An even worse case is that, under such high frequency, when the pulse comes intensively, the gate voltage of MN4 and MNout might not have enough time to rise above the threshold voltage which will defunct the charge pump. So we decided to size the relating transistors (MNdw and MNup) to keep the lowest gate voltage of MN4 and MNout around the threshold value. Through simulation, the device sizes are chosen as in Table 9.7.

MNdw MN1 MP1 MNup **Iref** MN<sub>0</sub> MN2  $4.2\mu/1.2\mu$  $4.8\mu/1.2\mu$  $1.8 \mu / 1.8 \mu$  $1.8\mu/1.8\mu$  $2.4 \mu / 1.2 \mu$ 50μΑ  $4.5\mu/1.2\mu$ MP2 MN<sub>3</sub> MP3 MN4 MP4 MNout **MPout**  $4.8 \mu / 1.2 \mu$  $2.4 \mu / 1.2 \mu$  $4.8\mu/1.2\mu$  $1.8\mu/1.2\mu$  $1.8\mu/1.2\mu$  $1.8\mu/1.2\mu$  $1.8\mu/1.2\mu$ 

Table 9.7 Transistor sizes of the charge pump circuit

Another consideration is that, we want the output quiescent point close to the value which causes the VCO to oscillate at the expected frequency. By doing this, the UP and DOWN pulses will have the same width which guarantee the sample edge is placed at the optimum point -- the middle of the data period.

A loop filter is used to extract the dc difference between UP and DOWN signal from charge pump. It works as an integrator, which stores the integration of charge and discharge current, filters out the high frequency components and applies the dc voltage to the VCO. Our loop filter has the most common configuration, which is shown in Figure 9.14.

Cp is the integrating capacitor, which stores the dc component. The series resistor Rz is introduced to adjust the damp factor, as discussed in the previous chapter. Cz is added to filter the ripple introduced by Rz.



Figure 9.14 Loop filter used in the phase detector

# 9.4 Measurement of $K_{PD}$ , Loop Bandwidth and Damping Factor

By combining phase detector, charge pump and loop filter, we can treat them in a whole as the phase detector in the basic topology of the previous chapter. Because the charge and discharge current depends on the UP and DOWN pulse width, it's hard to give an accurate model of this combination phase detector. However, an approximation of the range of  $K_{PD}$  can be derived from the simulation.

By adjusting the input phase error, which is referred to the drift from the static value of 180 degree, we can measure the charging and discharging peak current  $I_{CP}$  on the loop filter. In Table 9.8 below,  $\Delta t$  represents the time drift from the nominal point — middle of the data period.  $\Delta \phi$  is  $\frac{\Delta t}{T/2}\pi$ .

 $K_{PD}$  is the  $\Delta I_{cr}/\Delta \phi$  at the point of zero phase error (at the static phase error point). From Figure 9.15, we estimated  $K_{PD}=32.26\mu A/\pi$ . Then the voltage transfer function of the loop filter is

$$K_{PD} \times Z_{LPF} = K_{PD} \times \frac{1 + sRC_P}{s(1 + s\frac{RC_PC_Z}{C_P + C_Z})(C_P + C_Z)}$$
 (9.1)

Table 9.8 Measured output current at different phase error

| Δt      | -32ps      | -30ps      | -28ps      | -26ps      | -24ps      | -22ps      | -20ps      |
|---------|------------|------------|------------|------------|------------|------------|------------|
| Δφ      | -0.071π    | -0.067π    | -0.062π    | -0.058π    | -0.053π    | -0.049π    | -0.045π    |
| ICP(μA) | 2.745      | 2.579      | 2.407      | 2.241      | 2.064      | 1.88       | 1.695      |
| Δt      | -18ps      | -16ps      | -14ps      | -12ps      | -10ps      | -8ps       | -6ps       |
| Δφ      | -0.04π     | -0.036π    | -0.031π    | -0.027π    | -0.022π    | -0.018π    | -0.014π    |
| ICP(μA) | 1.52       | 1.356      | 1.195      | 1.032      | 0.871      | 0.696      | 0.514      |
| Δt      | -4ps       | -2ps       | 2ps        | 4ps        | 6ps        | 8ps        | 10ps       |
| Δφ      | -0.009π    | -0.004π    | $0.004\pi$ | $0.009\pi$ | $0.014\pi$ | 0.018π     | $0.022\pi$ |
| ICP(μA) | 0.33       | 0.16       | -0.142     | -0.288     | -0.434     | -0.585     | -0.742     |
| Δt      | 12ps       | 14ps       | 16ps       | 18ps       | 20ps       | 22ps       | 24ps       |
| Δφ      | $0.027\pi$ | $0.031\pi$ | $0.036\pi$ | 0.04π      | $0.045\pi$ | $0.049\pi$ | 0.053π     |
| ICP(μA) | -0.898     | -1.056     | -1.207     | -1.355     | -1.494     | -1.643     | -1.779     |
| Δt      | 26ps       | 28ps       | 30ps       | 32ps       |            |            |            |
| Δφ      | $0.058\pi$ | $0.062\pi$ | 0.067π     | 0.071π     |            |            |            |
| ICP(μA) | -1.928     | -2.069     | -2.202     | -2.332     |            |            |            |



Figure 9.15  $\Delta \phi$ -I transfer function of the phase detector

The bandwidth can be calculated using Equation 8.11, here we approximate the  $I_{\mbox{\tiny CP}}$  as 1 $\mu A$ .

$$\omega_n = \sqrt{\frac{I_{CP}}{2\pi C_P}} K_{VCO} = \sqrt{\frac{1\mu A}{2\pi \times 30 \, pF}} \times 2\pi \times 1G/V \cdot s = 5.77 \times 10^6 = 5.77 \, MHz$$

The damping factor can be calculated using Equation 8.12.

$$\zeta = \frac{R}{2} \sqrt{\frac{I_{CP} \cdot C_P}{2\pi} K_{VCO}} = \frac{5K}{2} \sqrt{\frac{1\mu A \times 30 \, pF}{2\pi} \times 2\pi \times 1G \, / \, V \cdot s} = 0.433$$

## 9.5 Startup Circuit

The VCO has a tuning range of 800 MHz-1.4 GHz, corresponding the range of the control voltage from 1.5 V to 0.85 V. It can be seen from the simulation that while the control voltage is lower than 0.8 V, the VCO is not stable or even doesn't oscillate at all. To ensure that the VCO oscillates when the circuit is powered on, a startup circuit is added to precharge the loop filter to 1.4 V. After that, the charging path will be cut off and only introduces some parasitic capacitance to the loop filter which can be neglected compared to the Cp and Cz.



Figure 9.16 Designed startup circuit to precharge the loop filter

The schematic of the startup circuit is shown in Figure 9.16. MPw is a weak PMOS which is always on but only supplies a very small current. If voltage on Cp is near ground, MPw will pull up the drain of MNs and MPs until MNch is turned on and charge Cp. Meanwhile MPs is also turned on and provides larger current to MNs so that MNch have to charge the gate voltage of MNs to 1.4V+ and pull down its drain

again. Then MPs and MNch are turned off, leaving the MNs weakly on, and the PLL is ready to work.

# 9.6 Closed Loop Simulation

The top level schematic of the closed loop is shown in Figure 9.17.

Block 1 is the pseudo-random NRZ data generator, which consists of 16 D flip-flop and an XOR gate. The period of repeated data sequence is about 200 ns. This generator is driven by a clock with a period of 900ps, equivalently a frequency of 1.1 GHz. Block 2 is the equivalent phase detector and block 3 is the 4-stage self-adjusted 50% duty cycle VCO.



Figure 9.17 Closed loop schematic

An assumption on this simulation is that the incoming differential data has already been converted to single-ended rail-to-rail NRZ signal.

First, a simulation is done with the startup circuit. The control voltage and duty cycle feedback are shown in Figure 9.18. From the waveform, we measured that the peak-to-peak ripple of the control voltage is 35 mV, corresponding to a frequency deviation of 35 MHz, a 3% deviation of the nominal frequency. Also it showed that, after the CRC is locked, both VCO control voltage and duty cycle control voltage become stable. The elapsed time for 60 °C is longer than that for 25 °C because the control voltages need to go lower. Fourier Transforms of the locked clocks are shown in Figure 9.19. As mentioned before, the duty cycle of the clock is essential to our phase detector. The waveform of the recovered clock at 25 °C and 60 °C are shown in Figure 9.20.



Figure 9.18 CRC with startup circuit



Figure 9. 19 DFT of the recovered clock



Figure 9.20 Waveform of the recovered clock



Figure 9.21 Transient current of the power supply



Figure 9.22 Average power dissipation of CRC



Figure 9.23 Regenerated data at 1.1Gbps, 25 °C



Figure 9.24 Regenerated data at 1.1Gbps, 60 °C

Transient supply current and average power dissipation, which are about 50 mA and 170 mW respectively, are shown in Figure 9.21 and Figure 9.22. Two segments of the regenerated output data from the phase detector are exactly the same as the input data except for a delay of 1 clock period, as shown in Figure 9.23 and 9.24.

Further measurements are based on the following two cases while the loop has already been in lock; (1) the incoming data rate changes abruptly; and (2) the incoming data rate changes slowly.

In the first case, an abrupt change of the data rate will lead the CRC to temporary lose lock and then try to obtain lock again. This is similar to the clock acquisition, in which the CRC loop is initially running at some certain frequency  $\omega_0$ , then an incoming data with the frequency  $\omega_{in} = \omega_0 \pm \Delta \omega$  comes in and the CRC needs to lock on this input frequency. Here the maximum  $\Delta \omega$  that the CRC can regain lock is not exactly the so-called capture range, because  $\omega_0$  is not the CRC free-running frequency. But it indicates how much impulse frequency change the CRC can tolerate and we call it impulse capture range.

The second case is different from the first one in that the change of the data rate is very slow so that the CRC can always track this change and keep locking. This case could give an idea about the lock range, which is the maximum range that CRC begins to lose lock. Theoretically, it is determined by the linear range of the monotonic range of the phase detector and the VCO. In both cases, we omitted the startup circuit and preset initial condition on Cp instead to save the simulation time.

To measure those parameters, we need a variable frequency source, which can be generated in two ways: (1) by a VCO and a pseudo random data generator; and (2) by a NRZ data file. For the former case, a block diagram is shown in Figure 9.25.



Figure 9.25 Configuration 1 for testing the impulse capture range

The PWL voltage applied to VCO1 has a step increment/decrement, and the control voltage to the VCO2 is expected to track the steps. From the simulation, a 30mV step is the maximum for the CRC to regain lock. Given the VCO gain of 1 GHz/V, we can derive that the impulse capture range is 60 MHz peak-to-peak.

The simulation results done at 25 °C and 60 °C are shown in Figure 9.26 - Figure 9.29, which show the VCO control voltage can track with the PWL step voltage.



Figure 9.26 Step control voltage at 25 °C



Figure 9.27 DFT of the recovered clocks at 25 °C



Figure 9.28 Step control voltage at 60 °C



Figure 9.29 DFT of the recovered clocks at 60 °C

An alternative way to measure the impulse capture range is to directly apply the data file with variable frequency to the CRC and check whether the clock is recovered at exactly the same frequency. The PWL data file, with a duration of 9 µs, is generated by a C program which was written by Mr. Satyaki Koneru. The block diagram is shown in Figure 9.30. The buf cell does nothing except for degrading the data waveform.

Here a five level of data rate is generated by the PWL source, which are 1.162 Gbps (T=860 ps), 1.137 Gbps (T=880 ps), 1.11 Gbps (T=900 ps), 1.087 Gbps (T=920 ps) and 1.063 Gbps (T=940 ps). Simulations were also done at both 25 °C and 60 °C and the

results are shown in Figure 9.31 — Figure 9.34. From the waveform, we can see the jumps of the VCO control voltage due to the abrupt changes of input data rate. After several cycles of ripple, the control voltage eventually become stable which indicates that the CRC is locked with the input data.



Figure 9.30 Configuration 2 for testing the impulse capture range



Figure 9.31 Control voltage to VCO at 25 °C



Figure 9.32 DFT of the recovered clocks at 25 °C



Figure 9.33 Control voltage to the VCO at 60  $^{\circ}\text{C}$ 



Figure 9.34 DFT of the recovered clocks at 60 °C

The acquisition time can be measured on the waveform, which is about  $0.3 \mu s$ .

To measure the lock range, the configuration of Figure 9.25 is again used, but now we apply a PWL step voltage to VCO1. The signal ranges from 0.95 V to 1.2 V at 25 °C corresponding to a frequency range of from 1.2 GHz to 850 MHz. The control voltage is shown in Figure 9.35, from which we can see the VCO control voltage and duty cycle control voltage tracked with the PWL source. Fourier transform of the output clock is shown in Figure 9.36 which gave the lock range 1.25 GHz-1.0 GHz = 250 MHz, equivalently 20% of the nominal data rate. Figure 9.37 and Figure 9.38 show the data regenerated at the lowest and highest frequency are exactly the same as the input data.



Figure 9.35 Control voltage at 25 °C



Figure 9.36 DFT of the clocks at the highest and Lowest Frequency



Figure 9.37 Time piece at the lowest frequency, at 25 °C



Figure 9.38 Time piece at the highest frequency, at 25 °C

Simulations were also done under 60°C and the waveforms are shown in Figure 9.39-Figure 9.42. The results show that the control voltage to the VCO tracked with the PWL step voltage; the data at the highest and lowest frequency slot were both regenerated correctly. The lock range can be measured from Figure 9.40, which is 1.15 GHz-930 MHz = 220 MHz.

From the simulations under 25 °C and 60 °C, we can see that the performance of CRC degrades while the temperature increases. The reason is that, at higher temperature, the mobility of electrons and holes drops hence the delay, rise and fall time all increase, the VCO tuning range shifts downwards. Eventually the system response becomes slow.



Figure 9.39 Control voltage at 60 °C



Figure 9.40 DFT of the clock at highest and lowest frequency



Figure 9.41 Time piece at the lowest frequency, at 60 °C



Figure 9.42 Time piece of the highest frequency, at 60 °C

# 10 SUMMARY AND CONCLUSION

Part II mainly focus on a clock recovery circuit designed for a Gbps transceiver. First some basic theories about PLL were introduced to give an insight understanding of the working scheme. Then components of the CRC are proposed and discussed. The self-adjusted 50% duty cycle VCO provides the fully in-phase differential clock for the phase detector. The phase detector has a simple architecture to acquire a speed above 1 Gbps. The modified charge pump has symmetric inputs and architecture which gives better linearity for the transfer function. Simulations on each component gave the performance and finally, a closed loop simulation was done. The system characteristics estimated from the simulation were listed in Table 10.1, which show that the design goals were basically met.

Table 10.1 Measured performance of the designed CRC

| Power dissipation     | 170 mW              |  |  |  |
|-----------------------|---------------------|--|--|--|
| K <sub>vco</sub>      | 2π×800M rads/V      |  |  |  |
| $K_{_{PD}}$           | 32.26 μΑ/π          |  |  |  |
| Impulse capture range | 60 MHz peak-to-peak |  |  |  |
| Capture time          | 300 ns              |  |  |  |
| Lock range            | >200 MHz            |  |  |  |
| Loop bandwidth        | 5.77 MHz            |  |  |  |
| Damping factor        | 0.433               |  |  |  |

# 11 GENERAL CONCLUSIONS AND FUTURE WORKS

#### 11.1 Conclusion

With the motivation of higher level of integration, lower power dissipation and cost, full CMOS solutions for high speed serial data communication are becoming the trend. This work reviewed and investigated the design of two building blocks for Gbps transceiver using CMOS technology. At the transmitter end, to achieve the target data rate as well as lower the power dissipation, a novel scheme and architecture for parallel-to-serial conversion was proposed. The pipelined data loading ensures every bit has enough time to be valid after being output. The multiple serial data path eases the design of VCO and the control logic. At the receiver end, a self-adjusted 50% duty cycle VCO was designed for the phase detector to give its best performance. A two NMOS input charge pump compatible with the outputs of the phase detector was introduced. Also, some layout considerations to optimize the high speed performance were discussed. The system performances were measured from the simulation and the design goals were met.

This work discussed circuit designs for high speed serial data communication with the focus on two functional blocks, the parallel-to-serial converter and clock recovery circuit. The ideas proposed can be used or adopted in future designs in similar applications.

# 11.2 Future Works

The transmitter has been sent for fabrication and tests will be done in the near future. Although the presented architecture inherently dissipates less dynamic power, the ratioed D flip-flop always has static power dissipation regardless of the bit stored. Some modification on its structure can be explored to reduce the power while maintaining the same performance. The 5-1 MUX is the bottleneck of the throughput of the converter, and the simple PMOS transistor is probably not the best solution. A new architecture of combining the MUX with a pseudo-NMOS logic is being investigated and will be evaluated. With breakthrough on the bottleneck, this architecture has the potential to work at 2 Gbps. The clock recovery from 1 Gbps brings great challenge to CMOS design. The self-adjusted 50% VCO has a high order of feedback loop which will go unstable at certain higher frequencies. Further feedback loop functions can be investigated and improved. To achieve faster acquisition, only a phase detector is not enough. As mentioned before, a frequency detector can be added to help capture the nominal frequency. Thus an external accurate reference clock and a switching operation between phase detector and frequency detector are required. As a side benefit, the startup circuit can also be saved. A 60 MHz peak-to-peak of input step-phase response is not enough and further effort can be spent to increase it.

# REFERENCES

- [1] Byungsoo Chang, Joonbae Park, and Wonchan Kim, A 1.2GHz Dual-Modulus Prescaler Using New Dynamic D-Type Flip-Flops, *IEEE Journal of Solid-State Circuits*, Vol. 31, No 5. Pp. 749-752, May 1996.
- [2] VSC7115/7116 Data sheet, VITESSE Semiconductor Inc., Camarillo, CA
- [3] Dao-Long Chen, Robert Waldron, A Single-Chip 266Mb/s CMOS Transmitter/Receiver for Serial Data Communication, ISSCC Digest of Technical Papers, pp.100-101, 1993.
- [4] John F. Ewen, Albert X. Widmer, Mehmet Souer, Kavin R. Wrenner, Ben Parker, Herschel A. Ainspan, Single-Chip 1062Mbaud CMOS Transceiver for Serial Data Communication, ISSCC Digest of Technical Paper, pp.32-33, 1995.
- [5] Alan Fiedler, Ross Mactaggart, James Welch, Shoba Krishnan, A 1.0625Gbps Transceiver with 2x-Oversampling and Transmit Signal Pre-Emphasis, ISSCC Digest of Technical Papers, pp. 238-239, 1997.
- [6] Dao-Long Chen, Michael O. Baker, A 1.25Gb/s 480mW CMOS Transceiver for Serial Data Communication, ISSCC Digest of Technical Papers, pp. 242-243, 1997.
- [7] Dan H. Wolaver, *Phase-Locked Loop Circuit Design*, Prentice Hall, Englewood Cliffs, New Jersey, 1991.
- [8] Behzad Razavi, Design of Monolithic Phase-Locked Loops and Clock Recovery Circuits-A tutorial, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, IEEE Press, Piscataway, NJ, 1996
- [9] Charles R. Hogge, A Self Correcting Clock Recovery Circuit, IEEE Journal of Solid-State Circuits, Vol. LT-3, pp. 1312-1314, December 1985.
- [10] A. Khursheed Enam, Asad A. Abidi, NMOS IC's for Clock and Data Regeneration in Gigabit-per-Second Optical-Fiber Receivers, IEEE Journal of Solid-State Circuits, Vol. 27, pp.1763-1774, December 1992.

- [11] Rloyd M. Gardner, Charge-Pump Phase-Lock Loops, *IEEE Journal of Solid-State Circuits*, Vol. COM-28, pp. 1849-1858, November 1980.
- [12] Mehmet Souuer, A Monolithic 2.3-Gb/s 100-mW Clock and Data recovery Circuit in Silicon Bipolar Technology, *IEEE Journal of Solid-State Circuits*, Vol. 28, pp. 1310-1313, December 1993.
- [13] Barry Thompson, Hae-Seung Lee, Lawrence M. Devito, A 300-MHz BiCMOS Serial Data Transceiver, *IEEE Journal of Solid-State Circuits*, Vol. 29, pp. 185-192, March 1994.
- [14] Ramon S. Co, J. H. Mulligan, Jr, Optimization of Phase-Locked Loop Performance in Data Recovery Systems, *IEEE Journal of Solid-State Circuits*, pp. 1022-1034, September 1994.
- [15] Dao-Long Chen, A Power and Area Efficient CMOS Clock/Data Recoveru Circuit For High-Speed Serial Interface, *IEEE Journal of Solid-State Circuits*, Vol. 31, pp.1170-1176, August 1996.
- [16] Todd C. Weigandt, Beomsup Kin, Paul R. Gray, Analysis of Timing Jitter in CMOS Ring Oscillators, International Symposium on Circuits and Systems (ISCAS), June, 1994.
- [17] John A. McNeill, Jitter in Ring Oscillators, *IEEE Journal of Solid-State Circuits*, Vol. 32, pp. 870-879, June 1997.
- [18] M. Afghahi, A Robust Single Phase Clocking For Low Power, High-Speed VLSI Application, IEEE Journal of Solid-State Circuits, Vol. 31, No. 2, pp.247-254, February 1996.
- [19] Qiuting Huang, Robert Rogenmoser, Speed Optimization of Edge-Triggered CMOS Circuits for Gigahertz Single Phase Clocks, *IEEE Journal of Solid-State Circuits*, Vol. 31, No. 3, pp. 456-465, March, 1996.
- [20] S2052 Fiber Channel Transceiver Datasheet, AMCC Corp, San Diego, CA
- [21] David Johns & Ken Martin, Analog Integrated Circuit Design, John Wiley & Sons Inc. New York, NY, 1997.
- [22] H. Ransijin and P. O'Connor, A PLL-based 2.5-Gb/s GaAs clock and data regenerator IC, IEEE Journal of Solid-State Circuits, vol. 26, pp. 1345-1353, Oct. 1991.

- [23] R. Walker et al., A 2-chip 1.5 Gb/s bus-oriented serial link interface, ISSCC Digest of Technical Papers, pp. 226-227, Feb. 1992.
- [24] Albert X. Winder et al., Single-chip 4x500 MBd CMOS Transceiver, IEEE Journal of Solid-State Circuits, Vol. 31. No. 12, December, 1996.